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Abstract 

This paper proposes a general class of regression models for continuous proportions when the 
data contain zeros or ones. The proposed class of models assumes that the response variable 
has a mixed continuous-discrete distribution with probability mass at zero or one. The beta 
distribution is used to describe the continuous component of the model, since its density has 
a wide range of different shapes depending on the values of the two parameters that index 
the distribution. We use a suitable parameterization of the beta law in terms of its mean and 
a precision parameter. The parameters of the mixture distribution are modeled as functions 
of regression parameters. We provide inference, diagnostic, and model selection tools for this 
class of models. A practical application that employs real data is presented. 

Keywords and Phrases: Continuous proportions; Zero-or-one inflated beta distribution; 
Fractional data; Maximum likelihood estimation; Diagnostics; Residuals. 



1 Introduction 

Statistical modeling of continuous proportions has received close attention in the last few years. 
Some examples of proportions measured on a continuous scale include the fraction of income 
contributed to a retirement fund, the proportion of weekly hours spent on work-related activities, 
the fraction of household income spent on food, the percentage of ammonia escaping unconverted 
from an oxidation plant, etc. Usual regression models, such as normal linear or nonlinear regression 
models, are not suitable for such situations. Different strategies have been proposed for modeling 
continuous proportions that are perceived to be related to other variables. Regression models 
that assume a beta distribution for the response variable are of particular interest. 

It is well known that the beta distribution is very flexible for modeling limited range data, 
since its density has different shapes depending on the values of the two parameters that index 
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the distribution: left-skewed, right-skewed, "C/," "J," inverted "J," and uniform (see Johnson, 
Kotz & Balakrishnan, 1995, §25.1). Beta regression models have been studied by Kieschnick & 
McCullough (2003), Ferrari & Cribari-Neto (2004), Espinheira, Ferrari & Cribari-Neto (2008a, 
2008b), Paohno (2001), Smithson & Verkuilen (2006), Korhonen et al. (2007), Simas, Barreto- 
Souza & Rocha (2010), and Ferrari & Pinheiro (2010), among others. 

Oftentimes, proportions data include a non-negligeable number of zeros and/or ones. When 
this is the case, the beta distribution does not provide a satisfactory description of the data, since 
it does not allow a positive probability for any particular point in the interval [0,1]. A mixed 
continuous-discrete distribution might be a better choice. This approach has been considered 
by Ospina & Ferrari (2010), who used the beta law to define the continuous component of the 
distribution. The discrete component is defined by a Bernoulli or a degenerate distribution at 
zero or at one. The proposed distributions are usually referred to as zero-and-one inflated beta 
distributions (mixture of a beta and a Bernoulli distribution) and zero-inflated beta distributions 
or one-inflated beta distributions (mixture of a beta and a degenerate distribution at zero or at 
one, depending on the case). 

This paper is concerned with a general class of regression models for modeling continuous 
proportions when zeros or ones appear in the data. Such class of models is tailored for the situation 
where only one of the extremes (zero or one) is present in the dataset. We shall consider a mixture 
of a beta distribution and a degenerate distribution at a fixed known point c, where c E {0, 1}. The 
beta distribution is conveniently parameterized in terms of its mean and a precision parameter. 
We shall allow the mean and the precision parameter of the beta distribution and the probability 
of a point mass at c to be related to linear or non-linear predictors through smooth link functions. 
Inference, diagnostic, and selection tools for the proposed class of models will be presented. Closely 
related to our work are the papers by Hoff (2007) and Cook, Kieschnick & McCullough (2008). Our 
model, however, is more general and convenient than those proposed by these authors. It relaxes 
linearity assumptions, allows all the parameters of the underlying distribution to be modeled as 
functions of unknown parameters and covariates, and uses a suitable parameterization of the beta 
law. Unlike their papers, our article offers a comprehensive framework for the statistical analysis 
of continuous data observed on the standard unit interval with a point mass at one of its extremes. 

The paper unfolds as follows. Section 2 presents a general class of zero-or-one infiated beta 
regression models. Section 3 discusses maximum likelihood estimation. Section 4 is devoted to 
diagnostic measures. Section 5 contains an application using real data and concluding remarks 
are given in Section 6. Some technical details are collected in two appendices. 



2 Zero-or-one inflated beta regression models 

The beta distribution with parameters /i and (0 < /i < 1 and (j) > 0), denoted by B{fi,(j)), has 
the density function 

where r(-) is the gamma function. The parameterization employed in (j2.ip is not the usual one, 
but it is suitable for modeling purposes. 

If 7/ ~ B{fi,(j)), then E(y) = fi and Var(y) = ^(1 — + 1)- Hence, fj, is the distribution 

mean and (j) plays the role of a precision parameter, in the sense that, for a fixed fi, the larger 
the value of (p, the smaller the variance of y. Since the beta distribution is very flexible, allowing 
a wide range of different forms, it is an attractive choice for modeling continuous proportions. A 
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possible shortcoming is that it is not appropriate for modehng datasets that contain observations 
at the extremes (either zero or one). Our focus is on the case where only one of the extremes 
appears in the data — a situation often found in empirical research. It is then natural to model 
the data using a mixture of two distributions: a beta distribution and a degenerate distribution 
in a known value c, where c equals zero or one, depending on the case. Under this approach, 
we assume that the probability density function of the response variable y with respect to the 
measure generated by the mixtur^ is given by 

bic(y;a,/i,0) = J V - c, ^2.2) 

\{l-a)f{y■,^l,(|,), ifyG(0,l), 

where f{y]^,4)) is the beta density ()2.ip . Note that a is the probability mass at c and represents 
the probability of observing zero (c = 0) or one (c = 1). If c = 0, the density (j2.2p is called a 
zero-inflated beta distribution, and if c = 1, the density is called a one-inflated beta distribution. 
The mean of y and its variance can be written as 

E(y) = ac + (1 - a)/i, 

Var(y) = (1 - «)^^^^ + «(! - a)(c - m)'. 

Note that E(?/) is the weighted average of the mean of the degenerate distribution at c and the 
mean of the beta distribution B{^, (j)) with weights a and 1 — a. Also, E(y | y S (0, 1)) = /x and 
Var(y|j/ € (0,1)) = ^(1 — /u)/(l + (j)). Other properties of this distribution can be found in 
Ospina &; Ferrari (2010). 

A general class of zero-or-one inflated beta regression models can be defined as follows. Let 
yi, - ■ ■ ,yn be independent random variables such that each yt, for t = 1, . . . , n, has probability 
density function (j2.2p with parameters a = at, /J, = fit, and 4> = 4>t. We assume that at, fit, and 
0t are defined as 

hi{at) = Tfit = fi{vt,p), 

h2{fit) = mt = f2{xt,P), (2.3) 
h'iickt) = mt = h{zt,-i), 

where p = {pi, . . . , pp)~^ , (3 = (/3i, . . . , /3fc)^ and 7 = (71, . . . , 7^)"'' are vectors of unknown 
regression parameters; {p + k + m < n), rfi = {rfu, . . . ,rfiny , rf2 = {ri2i, ■ ■ ■ ,ri2n)~^ , and 
r/3 = {rfsi, . . . ,rf3n)~^ are predictor vectors; and fi{-,-), /2(t)> s-'^d /sIt) linear or nonlin- 
ear twice continuously differentiable functions in the second argument, such that the derivative 
matrices V = drfi/dp, X = drj2/df3, and Z = drj^/d^ have ranks p, k and m, respectively, for all 
p, 13, and 7. 

Here vt = {vti, . . . ,vtp'), xt = {xti, . . . ,xtk'), and zt = (zti, . . . , ztm') are observations on 
p' + k' + m' known covariates. We also assume that the link functions hi : (0, 1) — )• M, /12 : 
(0,1) —7- M, and /13 : (0, +00) — )• M are strictly monotonic and twice differentiable. Various 
different link functions may be used. For fi and a one may choose h2{p) = log{/i/(l — /i)} (logit 
link); h2{p) = <I>^^(/i) (probit link), where $(•) denotes the standard normal distribution function; 
/i3(/x) = log{— log(l — p)} (complementary log-log link); and h2{p) = — log{— log(//)} (log-log 



^The probability measure P, corresponding to this distribution, defined over the measure space ((0, 1) U {c}, 23), 
where 23 is the class of all Borelian subsets of (0, 1) U {c}, is such that P « X + 5c, with A representing the Lebesgue 
measure, and Sc is such that Sc{A) — 1 if c £ A and Sc{A) — if c ^ A and yl G 23. 
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link), among others. Possible specifications for are h^{(t)) = log(/> (log link) and h-i{(j)) = 
(square-root link). 

Model ()2.2p - (j2.3p has a number of interesting features. The variance of yt is a function of 
(at, /Lit, 4>i) and, as a consequence, of the covariates values. Hence, non-constant response variances 
are naturally accommodated by the model. Also, the role that the covariates and the parameters 
play in the model is clear. For example, suppose c = 0. In this case, vt and p affect Pr(yf = 0), 
and /3 control E(2/t | G (0, 1)), and zt and 7 influence the precision of the conditional distribution 
of j/t, given that yt G (0, 1). This feature can be very useful for modeling purposes. For instance, 
if the response is the individual mobile communications expenditure proportion (MCEP), model 
(j2.2p - (j2.3p allows the researcher to take into account that some individuals do not spend at all 
on MCEP and to separately assess the effects of the heterogeneity among consumers and non- 
consumers of mobile communications (in this connection, see Yoo, 2004). 

Special cases. Model (|2.2l) - (l2.3p embodies two general classes of models: the zero-inflated beta 
regression model (c = 0) and the one-inflated beta regression model (c = 1), the first of which 
is suitable when the data include zeros and the second, when ones appear in the dataset. Each 
of them leads to a corresponding linear model when the predictors are linear functions of the 
parameters. In this case, we have p = p' , k = k' , m = m', hi{at) = vj p, h2{pt) = and 
h3{(t)t) = zjj. Here, V = {vJ , . . . ,v'J^y , X = {xl,...,xiy , and Z = {zj,...,zjy . Also, the 
nonlinear beta regression model (Simas, Barreto-Souza & Rocha, 2010, and Ferrari & Pinheiro, 
2010) is a limiting case of our model obtained by setting at = a — )• 0. If, in addition, the predictor 
for pt is linear and (j)t is constant through the observations, we arrive at the beta regression model 
defined by Ferrari & Cribari-Neto (2004). 

3 Likelihood inference 

The likelihood function for 9 = (p"'', /?"'', 7^)''' based on a sample of n independent observations is 

n 

L{d) = Wh\c{yuat,pu(^) = Li{p)L2{p,i), (3.1) 
t=i 

where 

n 

^i(/>)=n"^^'^'*^(i-"*)'"'^^^^'*^ 
t=i 

^2(/3,7)= n fiVt'^l^t^^t)^ 
i:yte(o,i) 

with ll^(yt) being an indicator function that equals 1 if yt S A, and 0, if yt ^ A. Here, at = 
hi^{r]it), pt = h2^{r]2t), and (/)t = h^^{i]3t), as defined in (j2.3p . are functions of p, (3, and 7, 
respectively. Notice that the likelihood function L{9) factorizes in two terms, the first of which 
depends only on p (discrete component), and the second, only on (/3,7) (continuous component). 
Hence, the parameters are separable (Pace &: Salvan, 1997, p. 128) and the maximum likelihood 
inference for (/3, 7) can be performed separately from that for p, as if the value of p were known, 
and vice-versa. 

The log-likelihood function is given by 

n 

i{e) = h{p) + e2{(3,^) = ^et{at)+Yl £t{pt,<t>t), (3.2) 

«=1 t:yte{0,l) 
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where 

Itiat) = \,}{yt)\ogat + (1 - log(l - at), (3.3) 

^t{^^u4>t) = iogr(0t) - iogr(/it</.t) - iogr((i - ^lt)4>t) + {iit4>t - i)yl + - 2)yJ, (3.4) 

= \og{yt/{l - yt)] and y\ = log(l - yt) if G (0, 1), and yj" = and y\ = otherwise. We 
have the fohowing conditional moments of yl and y\ : 

/X* = E(y; I G (0, 1)) = V(/it</'t) " ^'((l " Mt)0i), 

= E(yJ I yt G (0, 1)) = ^{{l - f,t)cl)t) - ^{(pt), 
v*t = Var(y; | yt G (0, 1)) = ^'{fitcj^t) - V''((l " f^t)<Pt), (3.5) 
vl = Var(yJ | yt G (0, 1)) = ^'{{1 - fit)<pt) - i^'i^t), 
c*t = Cov(y;,yJ | yt G (0, 1)) = -^'{{1 - fit)(t>t), 

with denoting the digamma function^ Notice that £i{p) represents the log-hkehhood function 
of a regression model for binary responses, in which the success probability for the tth observation 
is at = h^^{riit) (McCullagh & Nelder 1989, §4.4.1). On the other hand, £2(/3,7) is the log- 
likelihood function for (/3, 7) in a nonlinear beta regression model based on the observations 
that fall in the interval (0,1). Hence, the maximum likelihood (ML) estimation for this model 
can be accomplished by separately fitting a binomial regression model to the indicator variables 
^{c}(yt)) ^ = 1, . . . ,n, and a nonlinear beta regression model to the observations yt G (0, 1), 
for t = 1, . . . , n. 

The score function, obtained by the differentiation of the log-likelihood function with respect 
to the unknown parameters (see (jA.ip - ()A.3P : Appendix A), is given by 

U{9) = (f/»T,[/^(/3,7)^,t/^(/3,7)"')^, 

where 

U,{p) = V'^ADA*{y''-a), 
C/^(/3,7) = X'^iln - Y'^)T^{y* - ^*), (3.6) 
C/^(/3, 7) = zT(/„ - Y')H[M{y* - p*) + (yt - ^t)]. 

Here, y* = (yi,...,y.*)^, yt = (y|, . . . , yl)^, y^ = (]l{c}(yi), . . . , ll{c}(yn))^, /i* = (/xj, . . . , /i^)^, 
//t = . . . , filt)'^ , and a = (ai,...,a„)^ are n-vectors and Ai = diag(/ii, . . . , /u„), A = 
diag(l/ai,...,l/a„), A* = diag(l/(l - ai),...,l/(l - a„)), D = diag{l/h[{ai), 
...,l/h[{an)), $ = diag((/)i,..., (/.„), T = diag(l//i'2(/ii), . . . , l//i'2(//„)), = diag(l//i!5((Ai), 
. . . , l/h'^{(pn)), and = diag(ll{j;j(yi), . . . , ll|c}.(y„)) are nx n diagonal matrices. Also, /„ repre- 
sents the nx n identity matrix. The maximum likelihood estimators (MLEs) of p and {(3'^ 
are obtained as the solutions of the nonlinear systems Up{p) = and (C/^(/3, 7)''', [/^(/S, 7))"'' = 0. 
There are not closed form expressions for these estimators, and their computations should be 
performed numerically using a nonlinear optimization algorithm, e.g., some form of Newton's 
method (Newton-Raphson, Fisher's scoring, BHHH, etc.) or a quasi-Newton algorithm such as 
BFGS. An iterative algorithm for maximum likelihood estimation is presented in Appendix [Bl 
For more details on nonlinear optimization, see Press et al. (1992). 

^The digamma function is defined as i^{x) — dlogr(x)/da;, x > 0. 
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From the observed information matrix given in (TOll . it can be shown that the Fisher infor- 
mation matrix is 

K{e)= I Kpp Kp^ 1 , (3.7) 





















K^I3 





where Kpp = V"^WiV, Kp^ = X^W2X, K^p = KJ^ = X^V^^Z, K^^ = Z^^N^Z with Wi = 
{A* + ^)p2, Wa = ^T{V*A*'^}T^, W3 = T{^{MV* + C)A*-^}H, and W4 = H{{M^V* + 
2M.C + yt)^*-i}_ff. Notice that Kpp = Kp^^ = and Kp^ = K^p~^ = 0, thus indicating that 
the parameters 7 and (/?"'', 7''')''' are globally orthogonal (Cox & Reid, 1987) and their MLEs, p 
and (^ ,7^)^, are asymptotically independent. 

The inverse of Fisher's information matrix is useful for computing asymptotic standard errors 
of MLEs. From (j3.7p and a standard formula for the inverse of partitioned matrices (Rao, 1973, 
p. 33), we have 

(KPP \ 
K{e)-^ = K^f^ , (3.8) 

\ K^^ K"i"<) 

wheveKPP = (V"^WiV)-\ K^l^ = {X^ {W2-yV^Z{Z^V^ 4^Z)~^ Z^V^s)X}-^ , K^^ = {K'^'^Y = 
-{Z'^W4^Z)-^Z'^W^X{X~^ iy^2 - Mf^Z{Z'^W4^Z)-^Z^W^)X)-^, and K^^ 
{Z'^W^Z)-^ + {Z'^WiZ)-^Z'^W^X{X"'{W2 - W^Z{Z'^WiZ)-^Z'^Ws)X)-^ 
X'^^sZ X (ZTW4Z)-i. 



Fitting the model using the GAMLSS implementation. The zero-or-one inflated beta 
distribution has been incorporated in the GAMLSS framework (Rigby & Stasinopoulos, 2005); 
see Ospina (2006). GAMLSS allows the flexible modeling of each of the three parameters that 
index the distribution using parametric terms involving linear or nonlinear predictors, smooth 
nonparametric terms (e.g., cubic splines or loess), and random effects. Maximum (penalized) 
likelihood estimation is approached through a Newton-Raphson or Fisher scoring algorithm with 
the backfitting algorithm for the additive components. Our approach consists of an application of 
the gamlss functions, which are fully documented in the gamlss package (Stasinopoulos & Rigby, 
2007; see also http : //www. gamlss . org). The structure of the gamlss functions is familiar to 
readers who are used to the R (or S-Plus) syntax (the glm function, in particular). The set of 
gamlss packages can be freely downloaded from the R library at http://www.r-project.org/. 

Large sample inference. If the model specified by (j2.2p and (j2.3p is valid and the usual reg- 
ularity conditions for maximum likelihood estimation are satisfied (Cox & Hinkley, 1974, p. 107), 
the MLEs of 9 and K{e), d = (^^^,7)^ and K(e), respectively, are consistent. Assuming that 

I{9) = lim.n^ca{n'~^ K (6)} exists and is nonsingular, we have ^/n{6 — 0) —> Mp+k+rniO, I{6)^^), 

where — >• denotes convergence in distribution. Note that, if Oi denotes the /th component of ^, it 
follows that 

{9i-9i)[K{9fy"^ ^ ^-(0,1), 

where K{9)^^ is the /th diagonal element of K{9)^'^. A rigorous proof of the aforementioned 
asymptotic result can be obtained by extending arguments similar to those given in Fahrmeir &; 
Kaufmann (1985). 

Let K{pY'^ , K{I3Y^, and K{^)^^ be the estimated sth, rth and i?th diagonal elements of KPP, 
K^f^ , and K"^"^ , respectively. The asymptotic variances of p^, and 7^ are estimated by K{py^ , 
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K{PY'^, and K{j)^^, respectively. If < <j < 1/2, and represents the ^th quantile of the A/'(0, 1) 
distribution, we have p^± zi_^/21k{pY')'^/^ , f3,^.± zi_^/2{K{(3Y''y/^, and 7^±Zi_,/2(i^(7)^^)^/2 
as the limits of asymptotic confidence intervals (ACT) for pg, f3r, and 7r, respectively, all with 
asymptotic coverage 100(1 — ?)%. Additionally, an approximate 100(1 — ?)% confidence interval 
for /X* = E(y), the mean response for the given covariates v and x, can be computed as 

(j? - z^_^i^ s.e.(i?), 'i? + s.e.(/?)) , 
where c = or c = 1, depending on the case; fi* = ca + (1 — a)p, with a = h^^{rji); 'jl = /i^^(??2); 

m =/i(^;p); m =f2{x;P); and 

s.e.(^) = ^J{{c-^i)/h[{a)}^v'^KPPv + {{l-a)/h'2{i2)}'^x'^Ki^^x. 

Here, K^^ and K^l^ respectively equal K^p and K^^, evaluated at 0. 

The likelihood ratio, Rao's score, and Wald's (W) statistics to test hypotheses on the param- 
eters can be calculated from the log-likelihood function, the score vector, the Fisher information 
matrix, and its inverse given above. Their null distributions are usually unknown and the tests 
rely on asymptotic approximations. In large samples, a chi-squared distribution can be used as 
an approximation to the true null distributions. For testing the significance of the ith. regression 
parameter that models p, one can use the signed square root of Wald's statistic, /3j/s.e.(/3j), where 
s.e. (Pi) is the asymptotic standard error of the MLE of /3j obtained from the inverse of Fisher's 
information matrix evaluated at the maximum likelihood estimates. The limiting null distribution 
of the test statistic is standard normal. Significance tests on the 7's and p's can be performed in 
a similar fashion. 

Finite-sample performance of the MLEs. We now present the results of two Monte Carlo 
simulation experiments in order to investigate the finite-sample performance of the MLEs. The 

first experiment evaluates the impact of the magnitude of the probability of zero response on the 
MLEs. Here, the sample size is n = 150 and we focus on the zero-inflated beta regression model 
with logit(//) = /3o + + ,52a;2 + /53a;3, log(0) = 70 + 71-^1 + 72-^2 + 73-23, and a being constant for 
all observations. The true parameter values were taken as Po = —1; Pi = !> P2 = —0.5, ^3 = 0.5, 
7o = 2, 7i = 1, 72 = 0.5, and 73 = 0.5. Here, we consider four different values for the probability 
of observing zero: a = 0.18, a = 0.32, a = 0.68, and a = 0.82. 

The second experiment considers the zero-inflated beta regression model with logit(a) = 
Po + Pivi + P2V2 + P3V3, logit(M) = /3o + Pixi + 132X2 + /33X3, and log(^) = 70 + 71-2^1 + 72-^2 + 73-23- 
Here, we evaluate the performance of the MLEs when the sample size increases. The true values 
of the /?'s and the 7's arc the same as in the first experiment. The true values of the p's are 
Po = —1, pi = 1, p2 = —0.5, and ps = 0.5. In this situation the sample sizes considered are 
n = 50, 150, and 300. 

The explanatory variables vi,xi, and zi were generated as independent draws from a standard 
normal distribution. The covariates V2,X2, and Z2 were generated from the Poisson distribution 
with unit mean, and f3,X3, and Z3 were generated from the Binomial(0.2, 5) distribution. The 
total number of Monte Carlo replications was set at 5000 for each sample size. All simulations 
were carried out in R (Ihaka &; Gentleman, 1996). Computations for fitting inflated beta regression 
models were performed using the gamlss package. The MLEs were obtained by maximizing the 
log- likelihood function using the RS algorithm (Rigby & Stasinopoulos, 2005). In order to analyze 
the results, we computed the bias and the root mean squared error (\/MSE) of the estimates. 
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Table [T] summarizes the numerical results of the first experiment. We note that, for a fixed 
sample size, the bias and the \/MSE of the estimators of the continuous component (/3's and 7's) 
increase with the expected proportion of zeros (a). This is to be expected, since the expected 
number of observations in (0, 1) decreases as a increases. Also, it is noteworthy that the bias and 
the VMSE of MLEs corresponding to the dispersion covariates tend to be much more pronounced 
when compared with the MLEs of the parameters that model the mean response. 



Table 1: Simulation results for the first experiment. 
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0.02892 
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0.03158 


0.13205 


0.03203 


0.14619 


0.07302 


0.30639 


0.20369 


0.51323 


72 


0.03088 


0.12514 


0.04288 


0.16232 


0.09854 


0.30874 


0.23140 


0.68394 


73 


0.02745 


0.15964 


0.01527 


0.17183 


0.07067 


0.32288 


0.19891 


0.69026 



Table [2] presents simulation results for the second experiment. For the smallest sample size 
considered (n = 50), the estimation algorithm failed to converge in 1.3% of the samples. For large 
sample sizes the algorithm converged for all the samples. The estimated biases of the MLEs of 
the /o's (parameters of the discrete component) are markedly high for small samples. This is not 
surprising; in standard logistic regression, the MLEs are considerably biased in small samples. 
For large samples, the bias of the /?'s is negligible. 

In the second experiment, the /3's and the 7's are essentially estimated from the observations 
in (0,1), which represent around 70% of the observations in our study. For all the sample sizes 
considered, the mean of the /3's and 7's are close to the corresponding true values. Also, all the 
root mean square errors decrease when the sample size increases, as expected. 

4 Diagnostics 

Likelihood-based inference depends on parametric assumptions, and a severe misspecification of 
the model or the presence of outliers may impair its accuracy. We shall introduce some types of 
residuals for detecting departures from the postulated model and outlying observations. Addi- 
tionally, we suggest some measures to assess goodness-of-fit. 

Residuals. Residual analysis in the context of the zero-or-one inflated beta regression model 
()2.ip - ()2.3p can be split into two parts. First, we focus on the residual analysis for the discrete and 
the continuous component of the model separately. For this purpose, we propose a standardized 
Pearson residual based on Fisher's scoring iterative algorithm for estimating p, /3, and 7. Second, 
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Table 2: Simulation results for the second experiment. 











1 ^fi 




ouu 


Estimator 


Bias 


Vmse 


Bias 


Vmse 


Bias 


Vmse 


Po 


-0.14554 


0.81179 


-0.02398 


0.38136 


-0.01263 


0.26529 


Pi 


-0.18009 


0.56946 


-0.04623 


0.25131 


-0.02501 


0.17076 


P2 


-0.13892 


0.66011 


-0.03437 


0.24178 


-0.01803 


0.16958 


P3 


0.09667 


0.55895 


0.02113 


0.24240 


0.01093 


0.16877 


^0 


0.00588 


0.16190 


0.00171 


0.07288 


0.00055 


0.05272 




0.00588 


0.10917 


0.00146 


0.04882 


0.00027 


0.03313 




-0.00211 


0.09761 


-0.00131 


0.05057 


0.00018 


0.02872 


% 


0.00140 


0.12356 


0.00104 


0.05264 


0.00063 


0.03049 


70 


0.12628 


0.45696 


0.04208 


0.25820 


0.02472 


0.18738 


71 


0.04197 


0.29273 


0.03485 


0.15767 


0.00959 


0.10359 


72 


0.09771 


0.39828 


0.02333 


0.17590 


0.01153 


0.10208 


73 


0.04934 


0.36516 


0.02266 


0.18366 


0.00473 


0.11669 



we define a randomized quantile residual as a global residual for the model, i.e., using information 
of the discrete and the continuous component simultaneously. 

To study departures from the error assumptions as well as the presence of outlying observations 
of the discrete component for the zero-or-one inflated beta regression model we define a version 
of the standardized Pearson residual as 

rg^ , 'mW-S.^ ^ « = V^.n. (4.1) 



St(l - at){l - htt) 

where hu is the tth diagonal element of the orthogonal projection matrix H defined in (|B.5p and 
indicates evaluation at the MLEs. Note that is a version of the ordinary residual obtained 
in ()B.4p that takes the leverage of the tth observation into account. Plots of the residuals against 
the covariates or the fitted values should exhibit no systematic trend. 

The conditional distribution of yt, given that yt E (0, 1), is a beta distribution B{iJ,t, (pt)- Ferrari 
and Cribari-Neto (2004) provided two different residuals (standardized and deviance) for the class 
of beta regression models. Espinheira et al. (2008b) proposed two new beta regression residuals 
and their numerical results favor one of the new residuals: the one that accounts for the leverages 
of the different observations. Here, we follow Espinheira et al. (2008b) to define a weighted version 
of the ordinary residual from Fisher's scoring iterative algorithm for estimating (3 when 7 is fixed 
(see Appendix [B]). Our proposed residual is defined as 

r% = , " f.yt^ (0, 1), (4.2) 

t}?(l-at)(l-Pii) 

where Pft is the tth diagonal element of the projection matrix P defined in (|B.13p . Note that r^^ 
is a version of the standardized Pearson residual obtained in ()B.14p that takes the leverage of the 
tth observation into account. 
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To assess the overall adequacy of the zero-or-one inflated beta regression model to the data 
at hand, we propose the randomized quantile residual (Dunn & Smyth, 1996). It is a randomized 
version of the Cox &; Snell (1968) residual and given by 

r1 = ^-\ut), t = l,...,n, (4.3) 

where <!>(•) denotes the standard normal distribution function, ut is a uniform random variable 
on the interval (at,5t], with at = Ymiyfy^'Qlc{yt;at,'ilt,<pt) and bt = Blc{yt;at,'j2t,4>t)- Here, 
Blc{yt;at,iJ.t,(pt) = atll[c,i](2/t) + (1 - <y-t)F{yt; ixt,(t>t), where F{-]iit,4>t) is the cumulative dis- 
tribution function of the beta distribution B{^f,'Pt)- In the zero-inflated beta regression model, 
Ut is a uniform random variable on (0, Sf ] if = and ut = 'SiQ{yt]at,^t,4't) if yt G (O^l)- 
the other hand, in the one-inflated beta regression model, Ut is a uniform random variable on 
[ati 1) if = 1 and ut = BIi(yt; at-, /2t, </>*) if yt G (0, 1). Apart from sampling variability in at, Jit, 
and 4>t the are exactly standard normal in {at, bt] and the randomized procedure is introduced 
in order to produce a continuous residual. The randomized quantile residuals can vary from one 
realization to another. In practice, it is useful to make at least four achievements. 

A plot of these residuals against the index of the observations (t) should show no detectable 
pattern. A detectable trend in the plot of some residual against the predictors may be suggestive 
of link function misspecification. Also, normal probability plots with simulated envelopes are a 
helpful diagnostic tool (Atkinson, 1985). Simulation results not presented here indicated that 
the randomized quantile residuals perform well in detecting whether the distribution assumption 
is incorrect. 

Global goodness-of-fit measure. A simple global goodness-of-fit measure is a pseudo R'^, say 
Rp, defined by the square of the sample correlation coefficient between the outcomes, yi, . . . , yn, 

and their corresponding predicted values, . . . ,//*, where fi'^ = E(yi) = cat + (1 ~ ott)V't- A 
perfect agreement between the y's and /i*'s yields R^ = 1. Other pseudo i?^'s are defined as 
Rf = 1- logL/logLo (McFadden, 1974) and Rl^ = 1 - {Lo/Lf/'' (Cox and SneU, 1989, p. 
208-209), where Lq and L are the maximized likelihood functions of the null model and the fitted 
model, respectively. The ratio of the likelihoods or log-likelihoods may be regarded as measures 
of the improvement, over the model with only three parameters {a, ji, and (p), achieved by the 
model under investigation. 

Influence measures. A well-known measure of the influence of each observation on the regres- 
sion parameter estimates is the likelihood displacement (Cook & Weisberg, 1982, Ch. 3). The 
likelihood displacement that results from removing the tth observation from the data is defined 

by 

LA = ^W-^(^(t))}, 

where d is the dimension of 6 and ^(j) is the MLE of obtained after removing the tth observation 
from the data. This definition does not consider that is actually split into two different types 
of parameters: the parameters of the discrete component and the parameters of the continuous 
component. Thus, it is more appropriate to consider the influence of the ith case on the estimation 
of p and (/3, 7) separately. Therefore, we propose the following statistics: 

LDf = ^{£i(p) - 4(P(t))}> i = 0, 1, . . . , n 

LDf = ^—{e20,j) - l20(t),%))}, t: yt G (0, 1). 



10 



Simple approximations for LDf' and LD^" are given by 

iOP - eg = t = 0, !....,„ 

see Cook and Weisberg (1982, p. 191) and Wei (1998, p. 102). The approximations above are 
based on the iterative scheme for evaluating the MLE of p, (3, and 7 (Appendix [B]) and Taylor 
expansions of li{p{t)) and ^2 (/3(t) , 7(t) ) around p and (/3,7), respectively. The quantities cfj and 
can be useful to highlight influential cases. In practice, we recommend the plotting of and 
c§ against t. 



Model selection. Nested zero-or-one inflated beta regression models can be compared via 
the likelihood ratio test, using twice the difference between the maximized log-likelihoods of a 
full model and a restricted model whose covariates are a subset of the full model. Information 
criteria, such as the generalized Akaike information criterion (GAIC), can be used for comparing 
non- nested models. It is defined as 

GAIC = D + dp , (4.4) 

where D = —21 is the global fitted deviance (Rigby &: Stasinopoulus, 2005), I is the maximized 
log-likelihood and d is the dimension of 0. It is possible to interpret the first term on the right- 
hand side of (j4.4p as a measure of the lack of fit and the second term as a "penalty" for adding 
d parameters. The model with the smallest GAIC is then selected. The Akaike information 
criterion AIC (Akaike, 1974), the Schwarz Bayesian criterion SBC (Schwarz, 1978), and the 
consistent Akaike information criterion (CAIC) are special cases of GAIC corresponding to p = 2, 
p = log(n), and p = log(n) + 1, respectively. 



5 An application 

This section contains an application of the zero-inflated beta regression model to real data. 
We analyze data on the mortality in traffic accidents of 200 randomly selected Brazilian mu- 
nicipal districts of the southeast region in the year 2002. The data were extracted from the 
DATASUS database available at www.datasus.gov.bt and IPEADATA database available at 
www . ipeadata . gov . br( The response variable y is the proportion of deaths caused by traffic 
accidents. The explanatory variables are the logarithm of the number of inhabitants of the 
municipality (Inpop), the proportion of the population living in the urban area {propurb), the 
proportion of men in the population (propmen), the proportion of residents aged between 20 and 
29 years {prop2029), and the human development index of education of the municipal district 
(hdie). The main objective is to investigate the effect of the young population (prop2029) on 
the proportion of deaths caused by traffic accidents after controlling for potential confounding 
variables. We report the summary statistics on each of these variables in Table [3l 

Figure [l] presents a histogram and a box-plot of the response variable. The clump-at-zero 
(the bar with the dot above) in the histogram represents 39% of the data. We observe that 
the distribution of the data in (0,1) is asymmetric with an inverted "J" shape. Also, the box- 
plot reveals the presence of some outliers. Visual inspection suggests that a zero-inflated beta 
distribution may be a suitable model for the data. 
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Tabic 3: Sample STimmary statistics. 



variable 


mean 


s.d. 


min. 


1st qu. 


median 


3rd qu. 


max. 


Inpop 


9.4042 


1.3464 


7.3920 


8.3997 


9.0620 


10.0025 


16.1606 


propurb 


0.7173 


0.2044 


0.2042 


0.5887 


0.7445 


0.8862 


1.0000 


propmen 


0.5062 


0.0114 


0.4766 


0.4989 


0.5070 


0.5146 


0.5370 


prop2029 


0.1660 


0.0136 


0.1322 


0.1559 


0.1661 


0.1750 


0.2042 


hdie 


0.8236 


0.0587 


0.6060 


0.7955 


0.8325 


0.8640 


0.9330 
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y y 
Figure 1: Frequency histogram and box-plot of the observed proportions of deaths caused by 
traffic accidents. 



We consider a zero-inflated beta regression model with the following specification: 

logit(a) = po + pi Inpop + p2 propurb + p^ propmen + p4prop2029 + p^ hdei, 
logit(;[i) = Pq + Pi Inpop + /32 propurb + jS^ propmen + /34 prop2029 + f3^ hdei, 
log(0) =70 + 7i Inpop + 72 propurb + 73 propmen + 74 prop2029 + 75 hdei. 

Computations for fitting the model were carried out using the package gamlss (Stasinopoulus 
&; Rigby, 2007) in the R software package (Ihaka &: Gentleman, 1996) and the BEZI distribution 
implemented by Ospina (2006). The automatic procedure stepGAICAll.BO included in the 
gamlss package was used to perform model selection using the AIC. The following final model 
has been selected: 

logit(a) = Po + pi Inpop + p4prop2029 + ps hdei, 

logit(/x) = Pq + /3i Inpop + /34prop2029 + P5 hdei, (5.1) 
log(0) = 7o + 71 Inpop + 74prop2029 + 75 hdei. 

In fact, the value of the likelihood ratio statistic for testing Hq: 9 = (p2, Pa, /32, /^a, 72, 73) = 
versus a two-side alternative is A = 4.89 (p-value = 0.56), i.e., Hq is not rejected at the usual 
significance levels, and hence, propurb and propmen can be excluded from the model. 
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The parameter estimates and the corresponding standard errors for the final model are summa- 
rized in TableHl The pseudo i?^'s are Rp* = 0.597 and i^^R = 0.584, thus suggesting a reasonable 
fit of the model to the data. It is noteworthy that prop2029, which represents the proportion 
of the population aged between 20 to 29 years, has a positive effect on both the probability of 
observing at least one death due to traffic accidents (1 — a) and the conditional mean proportion 
of deaths due to this cause (fi). 

Table 4: Parameter estimates (est.), standard errors (s.e.), and p-values (p). 



a 


est. 


s.e. 


P 




intercept 


27.27 


4.63 


1.73e- 


-08 


Inpop 


-1.17 


0.27 


2.07e- 


-05 


prop2029 


-48.06 


17.46 


6.49e- 


-03 


hdei 


-11.34 


4.01 


5.13e- 


03 


M 


estimate 


s.e. 


P 




intercept 


-4.72 


1.20 


l.lle- 


-04 


Inpop 


-0.53 


0.05 


7.14e- 


-17 


prop2029 


27.68 


6.37 


2.33e- 


-05 


hdei 


3.10 


1.44 


3.28e- 


-02 




estimate 


s.e. 


P 




intercept 


9.46 


3.25 


4.08e- 


-03 


Inpop 


0.47 


0.10 


8.54e- 


-06 


prop2029 


-28.34 


16.63 


9.01e- 


-02 


hdei 


-6.70 


3.90 


8.78e- 


-02 



Figure [2] shows the plot of randomized quantile residuals against the case number (Figure 
[2]^a)) and the quantile-quantile plot of the ordered randomized quantile residuals against the 
corresponding quantiles of a standard normal distribution with a 95% envelope based on 100 
simulations (Figure EJb)). Figure [2ja) singles out observation 138 as possibly atypical, and 
Figure [2]^b) indicates that it lies in the threshold of the envelope. Case 138 is, therefore, worthy of 
further investigation. Notice that Figure[2]^b) clearly shows that the distribution of the randomized 
quantile residuals is asymmetric, and the usual thresholds (—2, 2) or (—3, 3) should be used with 
care. 

Diagnostic plots for the discrete component are given in Figures [3l^a)-{3]^c). The plot of 
against the case number (Figure [3)^a)) suggests that the residuals appear to be randomly scattered 
around zero and that there is no atypical observation. The plot of r^^ against St (Figure El^b)) 
is not suggestive of a lack of fit. The lower and upper bounds, corresponding to responses equal 
to zero and one, respectively, are typical of data with only two outcomes. Figure [3l^c) shows the 
plot of Cook statistics against the case number, from which observation 196 (an observation 
with the response variable equal to zero) is highlighted as influential. 

Diagnostic plots for the continuous component are given in Figures [3jd)-[3]^f). Figure [3]^d) 
shows the plot of r^^ against case number and suggests that the residuals are randomly scattered 
around zero. No observation is distinguished as atypical. The plot of against 'fit (Figure 
El^e)) is not suggestive of a lack of fit. Finally, the plot of against the case number indicates 
observations 2, 9, 85, and 138 as influential points for the continuous component. 

We have reestimated the model after removing the following sets of observations: {138}, {196}, 
and {2, 9, 85, 138}. Our purpose is to investigate the impact of the exclusion of observations 
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Figure 2: Quantile residual plots. 



highlighted in the diagnostic plots on the inferences based on the estimated model. Table [5] gives 
relative changes in the parameter estimates and the standard errors. Observation 138 alone is 
clearly influential for the estimation of the continuous component of the model, while observation 
196 solely impacts the inference on the discrete component, as expected. The joint exclusion of 
cases 2, 9, 85, and 138 produces substantial changes in the parameter estimates of the continuous 
component and only slight changes in the estimated model for the discrete part. Our findings 
suggest that the proposed diagnostic tools are helpful for detecting atypical observations and 
influential cases and are able to indicate the component of the model, discrete or continuous, that 
is aff'ected by each influential observation. 



6 Concluding remarks 

We developed a general class of zero-or-one inflated beta regression models that can be useful 
for practitioners when modeling response variables in the standard unit interval, such as rates or 
proportions, with the presence of zeros or ones. Explicit formulas for the score function. Fisher's 
information matrix, and its inverse are given. An iterative estimation procedure and its compu- 
tational implementation are discussed. Interval estimation for different population quantities is 
presented. We also proposed a set of diagnostic tools that can be employed to identify departures 
from the postulated model and the atypical and influential observations. These tools include 
pseudo i?^'s, different residuals, influence measures, and model selection procedures. One partic- 
ularly interesting feature of the proposed diagnostic analysis is that it allows the practitioner to 
separately identify influential cases on the discrete and the continuous components of the model. 
An application using real data was presented and discussed. 
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Figure 3: Diagnostic plots for the discrete component ((a)-(c)) and the continuous component 
((d)-(f)). 
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A Appendix A: Score vector and observed information matrix 

Score vector. The elements of the score vector are given by 

9Ps jr{ dat dr]it dps ^ - at) h[{at)dps' 

dl(0) ^ detitH,(l)t) dfii dr]2t , / * *^ 1 9V2t 

t:j/t€(0,l) t:Vfe(0,l) 

07ii ^ d(t)t dr]u O'jR ^ Km) d'yn v^-^^ 
for s = 1, . . . ,p; r = 1, . . . , fc; and R= 1, . . . , m. 
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Table 5: Parameter estimates (est.), standard errors (s.e.), p-values {p), relative changes in es- 
timates (rel. est.), and relative changes in standard errors (rel. s.e.) because of exclusions of 
observations (in percentage). 



n 

obs. 


parameter 


est. 


s.e 


P 


1 T 

rel. est. 


1 

rel. s.e 




Po 


27.26 


4.63 


0.00 


0.04 


—0.06 




Pi 


-1.17 


0.27 


0.00 


0.08 


-0.17 




P4 


-48.06 


17.46 


0.01 


-0.01 


0.01 




P5 


-11.34 


4.01 


0.01 


0.04 


0.00 




Po 


— 5.U5 


l.ii 


0.00 


—6.96 


Y.06 






-0.57 


0.06 


0.00 


-6.85 


-1.48 


138 


pi 


32.40 


6.42 


0.00 


-17.07 


-0.67 




/35 


2.93 


1.40 


0.04 


5.54 


2.64 




70 


1 / \ tin 

10. 00 


3.00 


0.00 


"1 i 1 ui\ 

— 12.60 


'7 ^7 A 

7.74 




71 


0.61 


0.11 


0.00 


—27.20 


—7.02 




74 


Art AG 

—43.46 


17.26 


0.01 


—53.34 


—3.78 




75 


—6.47 


3.83 


0.09 


3.52 


2.02 




Po 


30.78 


5.25 


0.00 


— 12.86 


— 13.34 




pi 


-1.38 


0.30 


0.00 


-18.15 


-12.33 




P4 


-51.62 


18.21 


0.01 


-7.41 


-4.30 




P5 


-12.68 


4.24 


0.00 


-11.80 


-5.95 




Po 


—4.72 


1.20 


0.00 


0.00 


0.00 




Pi 


-0.53 


0.06 


0.00 


0.00 


0.00 


196 


P4 


27.68 


6.38 


0.00 


0.00 


0.00 




/35 


3.10 


1.44 


0.03 


0.00 


0.00 




7o 






II III) 


1 1 III 1 


III! 




71 


0.48 


0.10 


0.00 


0.00 


0.00 




74 


-28.34 


16.64 


0.09 


0.00 


0.00 




75 


-6.70 


3.91 


0.09 


0.00 


0.00 




po 


26.97 


4.62 


0.00 


1.11 


0.17 




Pi 


-1.16 


0.27 


0.00 


1.17 


0.24 




P4 


-47.28 


17.45 


0.01 


1.63 


0.06 




P5 


-11.27 


4.00 


0.01 


0.67 


0.09 




Po 


-3.98 


1.11 


0.00 


15.69 


6.93 


2, 9, 85, 138 


/3i 


-0.58 


0.05 


0.00 


-8.90 


5.45 


/34 


29.71 


5.87 


0.00 


-7.33 


8.03 






2.29 


1.36 


0.09 


26.14 


5.94 




70 


7.48 


3.35 


0.03 


20.96 


-2.82 




71 


0.61 


0.14 


0.00 


-27.75 


-36.20 




74 


-29.35 


16.95 


0.08 


-3.57 


-1.86 




76 


-5.37 


4.17 


0.20 


19.85 


-6.70 



Observed information matrix. For s, s' = 1, . . . r, r' = 1, . . . , A;; and i?, i?' = 1, . . . , m, we have 



Jss' — 



dH{0)_ 



dpsdps' 



E 



-ii(o,i)(?/t) ^{c}{yt) 



1 



/ ^{c}{yt) - at 
y at{l - at) J [/ii("t) 

/ ll{c}(2/t) - at 1 
^\ at(l-at) [h[{at) 



{l-atr 

-hUat) 



1 



d'mt ] 

dpsdps' I 



d'^mt 



t:s/te(o,i) 



h[{at) 
dps dps' 



-h'^ip-t) \ 1 dV2t dr]2t 
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t:ate(o,i)l ^ 



'3t 



t:ate(o,i) 



t:»te(o.i) 



The observed Fisher information matrix can now be written as 




J{e) = ( Jpp Jp^ I , (A.4) 



where 



Jpp = V^{{A*^{In - Y^) + A^Y^) + A*A{Y'' - a*)V*V}V^ - [{y" - ay AV][V], 
Jpp = X^i^TV* + ST^{Y* - M*)}{In - y")r$A' + [{y* - - Y')T^][X], 

J^p = J:^p = -X^iiY* - M*) - ^MV* + C)}(/„ - Y')THZ, 
J^^ = Z^{H{M^V* + 2MC + V^) + {M{Y* - M*) + (F^ - Xt)}Q^2|^j^^ _ Y'')HZ 
+ [((y* - + (yt - ^t)T)(^^^ _ Y')H][Z], 

with a* = diag(ai,...,a„), Y* = diag(y^ . . . , y^), = diag(y]; , . . . , yj), X* = diag(^^ . . . , ^i^), M'^ = 
diag(/xj , . . . , ^Jj), and Q — diag(/i3 (0i), . . . , h'^{(j)n)); V is an nxpxp array with faces Vt = d'^rju/ dpdp^ , X 
is an nx fcx fc array with faces Xt = d'^ri2t/dpd/3^ , and Z is an nxmxm array with faces Zt = d^rjst/djdj^ , 
for t — 1, . . . ,n. The column muhiphcation for three-dimensional arrays is indicated by the bracket operator 
[•][•] as defined by Wei (1998, p. 



B Appendix B: Iterative algorithm for maximum likelihood es- 
timation 

MLEs for p and d = {fi^,(j))^ can be obtained by using a re- weighted least-squares algorithm. For p we 
have 

p(:m+l) ^ (y(m)T^^(m)^(m))-l^(m)^^(m)y^(m)^ m = 0, 1, . . . , (B.l) 

where Wi is defined in p.7p and 

y:^ =Vp + Wi-^ADA*{y'' ~a) (B.2) 

is a local modified dependent variable. This cycle is repeated until convergence is achieved. Rewriting this 
equation as 

yi=Vi-n+Wi-'ADA*iy' -a), (B.3) 

where r/i = /i(^;p) ^^'^ '''i — ^ "^P^ can interpret (jB.ip as an iterative process to fit a 

generalized linear model with design matrix V, systematic part hi{at) = rju, tth diagonal element of the 
variance function lWi~^ADA*]tt, t = 1, . . . ,n, and offset n. The offset quantity is subtracted, at each 
step, from the predictor rji. The iterative process (jB.ip may be performed, for instance, in the R package 
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by taking advantage of the library MASS (Venables & Ripley, 2002). This procedure allows us to extend 
the diagnostic results of ordinary regression models to the discrete component of the zero-or-one inflated 
beta regression model. Upon the converge of iterative process (jB.ip . we have p = (V^WiV)~^V^z, where 
z = Vp + 'W^^ADA*{y'' — a). The ordinary residual for this re- weighted regression is 

r* = Wy^(z -m)= W^^^^ADA*iy'' ~ a). 
Note that the tth. element of r* is 

^ - (B.4) 

^/at{l - at) 

By writing (V^WiV)p = V^z, it is possible to obtain the approximations E(r*) « and Var(r*) « 
(/„ — H), where /„ is the n x n identity matrix and 

tl = Wi^/^V{V^WiV)-^V^Wi^/'^ (B.5) 

is an orthogonal projection matrix onto the vector space spanned by the columns of V. The geometric 
interpretation of H as a projection matrix is discussed by Moolgavark et al. (1984). 

Now, let X, T, and W be the {2n x (fc + n)) and (2n x 2n), (2n x 2n) dimensional matrices 

respectively, and y^= ((y* — [J^iv* ^ /^*) + (y^ ^ Z-*^)]^) be an 1 x 2n auxiliary vector. The score 

vector corresponding to ?? = (/?^, 7^)^ can be written as 

U{d) = X^TY . (B.7) 
Fisher's information matrix for the parameter vector d is given by 

K{d) = X^WX . (B.8) 
Also, the iterative process for estimating d takes the form 

^(m+l) _ ^^(ni)T-^(m)^(m)^-l ^{m)-y^(m) y *{m) ^ (B.9) 

where y*(™) — Xd + W^^Ty and m = 0, 1, 2, . . . are the iterations that are performed until convergence, 

which occurs when the distance between iJ^^+i) and i?'-'"^ becomes smaller than a given, small constant. 
The procedure is initialized by choosing suitable initial values for j3 and 7. 

Assuming that 7 is known. Fisher's scoring iterative scheme used to estimating (3 can be written as 

where W2 is defined in p.7p and y2 = Xj5 + W2^^T$(y* — /^*), with m = 0, 1, 2, . . . . Upon convergence, 
we have 

P ^ {X^W^Xy^X^T, (B.ll) 

where t = Xp ^ W2 $(y* - M*)- Then, /3 in ^J\\ can be viewed as the least-squares estimate of 
P obtained by regressing t on X with weighting matrix W2. The ordinary residual of this re- weighted 
least-squares regression is 

r = (/„ - P)r = Wy'(T - XP) = W-^/'$f (y* - p). (B.12) 

Here, 

P = Wy'^{X'^W^X)-^X'^'wy'^ (B.13) 
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is a projection matrix. Note that if all the quantities are evaluated at the true parameter, E(r) = and 
Var(r) = (/„ — P). The P matrix is similar to the leverage matrix in standard linear regression models, and 
hence, we refer to it as the generalized leverage matrix. It is possible to show that /„ — P is symmetric, 
idempotent, and spans the residual r-space. This implies that a small 1 — Ptt indicates extreme points in 
the design space of the continuous component of the zero-or-one inflated beta regression model. Note that 
the tth element of r is 

n = ^S^- (B.14) 
V^(l-at) 
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